尽管基于计划的序列建模方法在连续控制方面表现出巨大的潜力,但由于高维空间中规划的高度计算复杂性和天生的困难,将它们扩展到高维状态序列仍然是一个开放的挑战。我们提出了轨迹自动编码计划器(TAP),这是一种基于计划的序列建模RL方法,可扩展到高州行动维度。使用状态条件矢量定量的变分自动编码器(VQ-VAE),点击模拟给定当前状态的轨迹的条件分布。当部署为RL代理时,TAP避免在高维连续动作空间中逐步计划,而是通过Beam Search寻找最佳的潜在代码序列。与$ o(d^3)$轨迹变压器的复杂性不同,TAP享受常数$ o(c)$规划有关州行动维度$ d $的计算复杂性。我们的经验评估还表明,随着维度的增长,TAP的表现越来越强。对于具有较高状态和动作维度的ADROIT机器人手动操纵任务,TAP超过了基于模型的方法,包括TT,其边距很大,并且还击败了强大的无模型参与者 - 批评基准。
translated by 谷歌翻译
在训练数据的分布中评估时,学到的模型和政策可以有效地概括,但可以在分布输入输入的情况下产生不可预测且错误的输出。为了避免在部署基于学习的控制算法时分配变化,我们寻求一种机制将代理商限制为类似于受过训练的国家和行动的机制。在控制理论中,Lyapunov稳定性和控制不变的集合使我们能够保证稳定系统周围系统的控制器,而在机器学习中,密度模型使我们能够估算培训数据分布。我们可以将这两个概念结合起来,产生基于学习的控制算法,这些算法仅使用分配动作将系统限制为分布状态?在这项工作中,我们建议通过结合Lyapunov稳定性和密度估计的概念来做到这一点,引入Lyapunov密度模型:控制Lyapunov函数和密度模型的概括,这些函数和密度模型可以保证代理商在其整个轨迹上保持分布的能力。
translated by 谷歌翻译
Model-based reinforcement learning methods often use learning only for the purpose of estimating an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers. While conceptually simple, this combination has a number of empirical shortcomings, suggesting that learned models may not be well-suited to standard trajectory optimization. In this paper, we consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem, such that sampling from the model and planning with it become nearly identical. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories. We show how classifier-guided sampling and image inpainting can be reinterpreted as coherent planning strategies, explore the unusual and useful properties of diffusion-based planning methods, and demonstrate the effectiveness of our framework in control settings that emphasize long-horizon decision-making and test-time flexibility.
translated by 谷歌翻译
强化学习(RL)通常涉及估计静止政策或单步模型,利用马尔可夫属性来解决问题。但是,我们也可以将RL视为通用序列建模问题,目标是产生一系列导致一系列高奖励的动作。通过这种方式观看,考虑在其他域中运用良好的高容量序列预测模型,例如自然语言处理,也可以为RL问题提供有效的解决方案。为此,我们探索如何使用变压器架构与序列建模的工具来解决RL,以将分布在轨迹上和将光束搜索作为规划算法进行重新定位。框架RL作为序列建模问题简化了一系列设计决策,允许我们分配在离线RL算法中常见的许多组件。我们展示了这种方法跨越长地平动态预测,仿制学习,目标条件的RL和离线RL的灵活性。此外,我们表明这种方法可以与现有的无模型算法结合起来,以在稀疏奖励,长地平线任务中产生最先进的策划仪。
translated by 谷歌翻译
我们介绍了$ \ Gamma $ -Model,一种具有无限概率的环境动态的预测模型。用$ \ gamma $ -models替换标准的单步模型导致程序中概括为基于模型的控制,包括模型卷展栏和基于模型的值估计。$ \ Gamma $ -Model,具有经常对时间差异学习的生成重新诠释的,是继任者表示的自然连续模拟和模型和基于模型的机制之间的混合。与价值函数一样,它包含有关长期未来的信息;与标准预测模型一样,它与任务奖励无关。我们将$ \ Gamma $ -Model实例化为生成的对抗网络和规范化流程,讨论其培训如何反映训练时间和测试时间复合错误之间的不可避免的权衡,并经验证明其效用进行预测和控制。
translated by 谷歌翻译
设计有效的基于模型的增强学习算法很困难,因为必须对模型生成数据的偏置权衡数据生成的易用性。在本文中,我们研究了模型使用在理论上和经验上的政策优化中的作用。我们首先制定和分析一种基于模型的加强学习算法,并在每个步骤中保证单调改善。在实践中,该分析过于悲观,并表明实际的脱助策略数据总是优选模拟策略数据,但我们表明可以将模型概括的经验估计纳入这样的分析以证明模型使用证明模型使用。通过这种分析的动机,我们证明,使用从真实数据分支的短模型生成的卷展栏的简单过程具有更复杂的基于模型的算法而没有通常的缺陷的效益。特别是,这种方法超越了基于模型的方法的样本效率,匹配了最佳无模型算法的渐近性能,并缩放到导致其他基于模型的方法完全失败的视野。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
We present a dynamic path planning algorithm to navigate an amphibious rotor craft through a concave time-invariant obstacle field while attempting to minimize energy usage. We create a nonlinear quaternion state model that represents the rotor craft dynamics above and below the water. The 6 degree of freedom dynamics used within a layered architecture to generate motion paths for the vehicle to follow and the required control inputs. The rotor craft has a 3 dimensional map of its surroundings that is updated via limited range onboard sensor readings within the current medium (air or water). Path planning is done via PRM and D* Lite.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译